perm filename 780.TIM[TIM,LSP]2 blob
sn#617894 filedate 1981-10-08 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00005 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 ∂07-May-81 2350 ROD
C00011 00003 ∂06-Apr-81 1153 RPG Times
C00012 00004 ∂29-Sep-81 1104 CSVAX.fateman at Berkeley Re: Timing
C00014 00005 ∂05-Oct-81 1245 SL
C00017 ENDMK
C⊗;
∂07-May-81 2350 ROD
To: TOB, RPG, BIS
Date: 6 May 1981 2331-PDT
From: KASHTAN
Subject: VAX-11/750 <--> VAX-11/780 benchmarks
To: quam, witkin, hanson, jirak, wilcox, meyers, larson, kennard, sad,
heathman at SRI-AI, ryland at SRI-AI, burback at SRI-AI, mcghie,
sword
Here are the complete results of the 11/750 - 11/780 benchmarks. Looks
like the 11/750 gets to memory faster (and is optimized w.r.t. getting
to memory faster) than the 11/780. It loses VERY badly when it comes to
actually executing instructions, as the execution unit is very much slower
in the 750 than the 780. This is particulary born out by the execution
benchmarks for the convolution program in various languages. The languages
vary from BLISS (which keeps the whole world in registers) to LISP (which
keeps the whole world in memory). Even though the 750 gets to memory faster,
it doesn't do you much good when it takes so long to process what you got
from memory (even a simple move).
The 750 does a good job of operand processing (especially given its relative
CPU speed) but this doesn't seem to help too much in actual program execution,
as on the 750 the execution time seems to be dominated by the instruction
execution time rather than the on the operand fetch time (as is the case on
the 780).
A note on the Richard Fateman's 750 benchmarks. Seems that all they did was
run a Liszt (Franz Lisp compiler) compile on one of Bell Labs UNIX systems.
A compiled Franz Lisp program (as Liszt is) tends to be very heavy on CALLS
and on moving things around in memory (i.e. to and from the stack). No
intermediate results are kept in registers at all. What this does is skew
the results somewhat towards a faster looking 750 (since the 750 will benefit
from any benchmarks that are heavily involved in memory referencing). What
he reported was that the 750 was indeed about 60% of the 780 in this case.
PLEASE NOTE that large IU and VLSI programs, while we might consider them
memory intensive, are really virtual memory intensive (i.e. have very large
working sets). This is not the same as the above benchmark. Most IU and
VLSI programs when compiled with good compilers will tend to do a small amount
of computation (even just an add or multiply) with each datum fetched from
memory. You can expect the performance of the 750 relative to the 780 to
drop quite a bit from the above mentioned 60%. It should become very much
like the following convolution benchmarks (a very good example of a virtual
memory intensive program that does a small amount of computation with each
datum fetched). An interesting side note: CARs and CDRs in compiled lisp
tend to come out as "movl x(r),dst" (which executes at about 60% of 780
speed).
My feeling from playing with the two systems is that the 750 is best used
as an entry level system for those sites which need to acquire the smallest
possible VAX configuration (i.e. the lowest possible price). An entry level
750 goes for about $90K while an entry level 780 system with approximately
the same configuration would go for about $140K. Clearly there is a big
difference here (almost all of it in the price of the CPU). As the systems
get larger the price advantage goes away (as the price will note be dominated
by the CPU price, which is the case in the smaller systems, but by memory /
peripheral prices). Here you will save about $50K on a $250K system and get
less than 1/2 the machine.
I am somewhat confused by the divide instruction timings. There are a couple
of possibilities here - 1) a stupidity in the 780 was fixed in the 750
2) I muffed the 780 test (don't thing so, as I
triple checked it)
3) I muffed the 750 test.
I find it incredible that MULL is 4x as fast on the 780 while DIVL is a bit
slower on the 780. I did not do any floating point tests, as there is no
floating point accelerator on the 750.
David
-------------------------------------------------------------------------------
VAX-11/750 vs VAX-11/780
------------------------
Simple 2D convolution program:
11/750 11/750 (% of 11/780) 11/780
------ -------------------- ------
BLISS-32 5.45 sec 45% 2.5 sec
VMS PASCAL 12.9 sec 38% 4.9 sec
UNIX C 11.3 sec 44% 5.0 sec
UNIX F77 39.9 sec 29% 11.4 sec
Compiled
Franz Lisp 76.5 sec 53% 41.0 sec
Instruction timings:
movl r,r 1000nSec 40% 400nSec
movl x(PC),r 1760nSec 45% 800nSec
movl r,x(PC) 2300nSec 52% 1300nSec
movl (r),r 1330nSec 60% 800nSec
Addressing modes:
r 0nSec -- 0nSec
# (short) 0nSec -- 0nSec
# (long) 700nSec 57% 400nSec
(r) 330nSec 120% 400nSec
(r)+ 330nSec 120% 400nSec
-(r) 330nSec 120% 400nSec
@(r)+ 900nSec 111% 1000nSec
x(r) 500nSec 80% 400nSec
@x(r) 1150nSec 86% 1000nSec
[r] 1000nSec 60% 600nSec
Instructions:
MOVL 1000nSec 40% 400nSec
ADDL
SUBL
etc
MULL 8000nSec 25% 2000nSec
DIVL 8000nSec 112% 9000nSec
CALLx/RET 20000nSec+1800nSec/register 15000nSec+
100% 2000nSec/Reg
JSB/RSB 6000nSec 50% 3000nSec
SOBGxx 2000nSec 50% 1000nSec
ACBL 5600nSec 71% 4000nSec
MOVC3 350nSec/byte 107% 375nSec/byte
3 operand +500nSec 40% +200nSec
instructions
-------
---------------
-------
-------
∂06-Apr-81 1153 RPG Times
kl 780 750 fv1 s1 2080
peak 3 1 0.6 2.8 20 24
scalar
ave 1.9 0.8 0.5 2.0 ? 9.5
logic
float 0.5 0.6 0.2 1.0 80 7 (claimed 12?)
(dbl
floating mult)
Peak scalar is instruction drain rate (no memory fetches), ave logic is
non-arithmetic with memory fetches. Floating is floating peak (drain rate),
in MIPs. KL (model a cpu), fv1 is super-vax (not announced).
-rpg-
∂29-Sep-81 1104 CSVAX.fateman at Berkeley Re: Timing
Date: 28 Sep 1981 16:45:34-PDT
From: CSVAX.fateman at Berkeley
To: RPG@SU-AI
Subject: Re: Timing
we ran a particular demo file for VAX macsyma (available as
mit-mc:demo;begin demo) on the 11/780 (low usage) and on
the 11/750 (single user) and found that the 11/780 did the job
in 73% of the reported CPU time of the 11/750. This was not
done with equal amounts of memory or identical disks; I believe
those factors would tend to favor the 780. This is not
the DEC "SUVAX" of earlier times, which is considerably slower than
the 750. I suspect that with the floating point accelerator, the
750 would be quite nice; especially as memory up to 8 megabytes
will be available with 64kram chips.
∂05-Oct-81 1245 SL
A VAX system is requested to handle the capabilities of ACRONYM, the integrated
vision system. ACRONYM now includes 2MB of system.
We expect it to expand as we add database facilities and new code.
It currently accomodates
only small pictures, 256x256. With large pictures, the address space will be
much larger. Current usage of the group is approximately 72% of a VAX 11/780,
or 125% of a VAX 11/750. Typical experiments require 10 minutes now on a KL/10,
equivalent to 30 minutes on a VAX 11/780. Although it is planned to make ACRONYM
more efficient, it is essential that execution time be in the range of
5 minutes for adequate debugging.
Compute power is important, thus a floating point accelerator (3% of total cost)
and interleaved memory (second memory controller (7% of total cost)).
For multiple users of large LISP systems, large memory is essential
because both VMS and UNIX impose heavy penalties in paging.
Fortunately memory is inexpensive (9.5% of total cost).
Current disk usage is estimated at 300 megabytes for reasonable functioning.
That includes inadequate storage for pictures.
A second disk is essential for adequate system performance in paging to avoid
disk contention, and a second controller makes a noticeable difference in
performance. The large disk costs only 10% more than a smaller disk and provides
room for growth and for storage of data base and images.
Pictures and archival storage will
use a tape drive along with storage at SAIL over the network.